Skip to content

Comments

Add Nemotron-Colembed-v2 results for Vidore V1-V3#408

Merged
Samoed merged 3 commits intoembeddings-benchmark:mainfrom
rnyak:nemotron_colembed_results
Jan 27, 2026
Merged

Add Nemotron-Colembed-v2 results for Vidore V1-V3#408
Samoed merged 3 commits intoembeddings-benchmark:mainfrom
rnyak:nemotron_colembed_results

Conversation

@rnyak
Copy link
Contributor

@rnyak rnyak commented Jan 26, 2026

Added Vidore v1-v3 benchmark results for three multimodal embedding models: llama-nemotron-colembed-vl-3b-v2, nemotron-colembed-vl-4b-v2 and nemotron-colembed-vl-8b-v2

Checklist

  • My model has a model sheet, report, or similar
  • My model has a reference implementation in mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here
  • The results submitted are obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have, I have disclosed it clearly.

@github-actions
Copy link

github-actions bot commented Jan 26, 2026

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: nvidia/llama-nemotron-colembed-vl-3b-v2, nvidia/nemotron-colembed-vl-4b-v2, nvidia/nemotron-colembed-vl-8b-v2
Tasks: Vidore2BioMedicalLecturesRetrieval, Vidore2ESGReportsHLRetrieval, Vidore2ESGReportsRetrieval, Vidore2EconomicsReportsRetrieval, Vidore3ComputerScienceRetrieval, Vidore3EnergyRetrieval, Vidore3FinanceEnRetrieval, Vidore3FinanceFrRetrieval, Vidore3HrRetrieval, Vidore3IndustrialRetrieval, Vidore3PharmaceuticalsRetrieval, Vidore3PhysicsRetrieval, VidoreArxivQARetrieval, VidoreDocVQARetrieval, VidoreInfoVQARetrieval, VidoreShiftProjectRetrieval, VidoreSyntheticDocQAAIRetrieval, VidoreSyntheticDocQAEnergyRetrieval, VidoreSyntheticDocQAGovernmentReportsRetrieval, VidoreSyntheticDocQAHealthcareIndustryRetrieval, VidoreTabfquadRetrieval, VidoreTatdqaRetrieval

Results for nvidia/llama-nemotron-colembed-vl-3b-v2

task_name nvidia/llama-nemotron-colembed-vl-3b-v2 Max result Model with max result In Training Data
Vidore2BioMedicalLecturesRetrieval 0.6319 0.6547 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore2ESGReportsHLRetrieval 0.7311 0.7698 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-7B-v1 False
Vidore2ESGReportsRetrieval 0.5864 0.6244 TomoroAI/tomoro-colqwen3-embed-4b False
Vidore2EconomicsReportsRetrieval 0.5859 0.6219 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 False
Vidore3ComputerScienceRetrieval 0.7709 0.7752 VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 False
Vidore3EnergyRetrieval 0.6488 0.6841 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3FinanceEnRetrieval 0.6423 0.6508 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3FinanceFrRetrieval 0.4441 0.4910 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3HrRetrieval 0.6228 0.6398 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3IndustrialRetrieval 0.5171 0.5441 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3PharmaceuticalsRetrieval 0.6604 0.6636 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3PhysicsRetrieval 0.4693 0.5013 TomoroAI/tomoro-colqwen3-embed-8b False
VidoreArxivQARetrieval 0.9040 0.9380 VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 True
VidoreDocVQARetrieval 0.6717 0.6696 VAGOsolutions/SauerkrautLM-ColQwen3-4b-v0.1 True
VidoreInfoVQARetrieval 0.9468 0.9492 nvidia/llama-nemoretriever-colembed-3b-v1 True
VidoreShiftProjectRetrieval 0.9200 0.9293 jinaai/jina-embeddings-v4 False
VidoreSyntheticDocQAAIRetrieval 1.0000 1.0000 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 True
VidoreSyntheticDocQAEnergyRetrieval 0.9802 0.9763 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 True
VidoreSyntheticDocQAGovernmentReportsRetrieval 0.9795 0.9889 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 True
VidoreSyntheticDocQAHealthcareIndustryRetrieval 0.9889 1.0000 VAGOsolutions/SauerkrautLM-ColQwen3-4b-v0.1 True
VidoreTabfquadRetrieval 0.9725 0.9596 nomic-ai/colnomic-embed-multimodal-7b False
VidoreTatdqaRetrieval 0.8104 0.8404 VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 True
Average 0.7493 0.7669 nan -

Model have high performance on these tasks: VidoreSyntheticDocQAEnergyRetrieval,VidoreTabfquadRetrieval,VidoreDocVQARetrieval

Training datasets: HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, JinaVDRArxivQARetrieval, JinaVDRDocQAAI, JinaVDRDocQAEnergyRetrieval, JinaVDRDocQAGovReportRetrieval, JinaVDRDocQAHealthcareIndustryRetrieval, JinaVDRDocVQARetrieval, JinaVDRInfovqaRetrieval, JinaVDRTatQARetrieval, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoNQ-VN, NanoNQRetrieval, SQuAD, StackExchangeClustering, StackExchangeClustering-VN, StackExchangeClustering.v2, VDRMultilingualRetrieval, VidoreArxivQARetrieval, VidoreDocVQARetrieval, VidoreInfoVQARetrieval, VidoreSyntheticDocQAAIRetrieval, VidoreSyntheticDocQAEnergyRetrieval, VidoreSyntheticDocQAGovernmentReportsRetrieval, VidoreSyntheticDocQAHealthcareIndustryRetrieval, VidoreTatdqaRetrieval, VisRAG-Ret-Train-In-domain-data, VisRAG-Ret-Train-Synthetic-data, WebInstructSub, docmatix-ir, wiki-ss-nq


Results for nvidia/nemotron-colembed-vl-4b-v2

task_name nvidia/nemotron-colembed-vl-4b-v2 Max result Model with max result In Training Data
Vidore2BioMedicalLecturesRetrieval 0.6432 0.6547 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore2ESGReportsHLRetrieval 0.7143 0.7698 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-7B-v1 False
Vidore2ESGReportsRetrieval 0.6148 0.6244 TomoroAI/tomoro-colqwen3-embed-4b False
Vidore2EconomicsReportsRetrieval 0.6075 0.6219 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 False
Vidore3ComputerScienceRetrieval 0.7856 0.7752 VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 False
Vidore3EnergyRetrieval 0.6747 0.6841 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3FinanceEnRetrieval 0.6502 0.6508 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3FinanceFrRetrieval 0.4901 0.4910 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3HrRetrieval 0.6239 0.6398 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3IndustrialRetrieval 0.5391 0.5441 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3PharmaceuticalsRetrieval 0.6610 0.6636 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3PhysicsRetrieval 0.4886 0.5013 TomoroAI/tomoro-colqwen3-embed-8b False
VidoreArxivQARetrieval 0.9203 0.9380 VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 True
VidoreDocVQARetrieval 0.6739 0.6696 VAGOsolutions/SauerkrautLM-ColQwen3-4b-v0.1 True
VidoreInfoVQARetrieval 0.9331 0.9492 nvidia/llama-nemoretriever-colembed-3b-v1 True
VidoreShiftProjectRetrieval 0.9226 0.9293 jinaai/jina-embeddings-v4 False
VidoreSyntheticDocQAAIRetrieval 0.9926 1.0000 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 True
VidoreSyntheticDocQAEnergyRetrieval 0.9619 0.9763 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 True
VidoreSyntheticDocQAGovernmentReportsRetrieval 0.9802 0.9889 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 True
VidoreSyntheticDocQAHealthcareIndustryRetrieval 0.9852 1.0000 VAGOsolutions/SauerkrautLM-ColQwen3-4b-v0.1 True
VidoreTabfquadRetrieval 0.9805 0.9596 nomic-ai/colnomic-embed-multimodal-7b False
VidoreTatdqaRetrieval 0.8119 0.8404 VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 True
Average 0.7571 0.7669 nan -

Model have high performance on these tasks: VidoreTabfquadRetrieval,Vidore3ComputerScienceRetrieval,VidoreDocVQARetrieval

Training datasets: JinaVDRArxivQARetrieval, JinaVDRDocQAAI, JinaVDRDocQAEnergyRetrieval, JinaVDRDocQAGovReportRetrieval, JinaVDRDocQAHealthcareIndustryRetrieval, JinaVDRDocVQARetrieval, JinaVDRInfovqaRetrieval, JinaVDRTatQARetrieval, VDRMultilingualRetrieval, VidoreArxivQARetrieval, VidoreDocVQARetrieval, VidoreInfoVQARetrieval, VidoreSyntheticDocQAAIRetrieval, VidoreSyntheticDocQAEnergyRetrieval, VidoreSyntheticDocQAGovernmentReportsRetrieval, VidoreSyntheticDocQAHealthcareIndustryRetrieval, VidoreTatdqaRetrieval, VisRAG-Ret-Train-In-domain-data, VisRAG-Ret-Train-Synthetic-data, docmatix-ir, wiki-ss-nq


Results for nvidia/nemotron-colembed-vl-8b-v2

task_name nvidia/nemotron-colembed-vl-8b-v2 Max result Model with max result In Training Data
Vidore2BioMedicalLecturesRetrieval 0.6616 0.6547 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore2ESGReportsHLRetrieval 0.7315 0.7698 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-7B-v1 False
Vidore2ESGReportsRetrieval 0.6056 0.6244 TomoroAI/tomoro-colqwen3-embed-4b False
Vidore2EconomicsReportsRetrieval 0.6076 0.6219 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 False
Vidore3ComputerScienceRetrieval 0.7929 0.7752 VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 False
Vidore3EnergyRetrieval 0.6982 0.6841 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3FinanceEnRetrieval 0.6729 0.6508 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3FinanceFrRetrieval 0.5154 0.4910 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3HrRetrieval 0.6632 0.6398 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3IndustrialRetrieval 0.5603 0.5441 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3PharmaceuticalsRetrieval 0.6719 0.6636 TomoroAI/tomoro-colqwen3-embed-8b False
Vidore3PhysicsRetrieval 0.5084 0.5013 TomoroAI/tomoro-colqwen3-embed-8b False
VidoreArxivQARetrieval 0.9308 0.9380 VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 True
VidoreDocVQARetrieval 0.6805 0.6696 VAGOsolutions/SauerkrautLM-ColQwen3-4b-v0.1 True
VidoreInfoVQARetrieval 0.9456 0.9492 nvidia/llama-nemoretriever-colembed-3b-v1 True
VidoreShiftProjectRetrieval 0.9330 0.9293 jinaai/jina-embeddings-v4 False
VidoreSyntheticDocQAAIRetrieval 1.0000 1.0000 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 True
VidoreSyntheticDocQAEnergyRetrieval 0.9789 0.9763 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 True
VidoreSyntheticDocQAGovernmentReportsRetrieval 0.9889 0.9889 ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 True
VidoreSyntheticDocQAHealthcareIndustryRetrieval 0.9963 1.0000 VAGOsolutions/SauerkrautLM-ColQwen3-4b-v0.1 True
VidoreTabfquadRetrieval 0.9774 0.9596 nomic-ai/colnomic-embed-multimodal-7b False
VidoreTatdqaRetrieval 0.8337 0.8404 VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 True
Average 0.7707 0.7669 nan -

Model have high performance on these tasks: VidoreSyntheticDocQAEnergyRetrieval,VidoreTabfquadRetrieval,VidoreShiftProjectRetrieval,Vidore3ComputerScienceRetrieval,Vidore3EnergyRetrieval,VidoreDocVQARetrieval,Vidore3PharmaceuticalsRetrieval,Vidore2BioMedicalLecturesRetrieval,Vidore3FinanceEnRetrieval,Vidore3HrRetrieval,Vidore3IndustrialRetrieval,Vidore3PhysicsRetrieval,Vidore3FinanceFrRetrieval

Training datasets: JinaVDRArxivQARetrieval, JinaVDRDocQAAI, JinaVDRDocQAEnergyRetrieval, JinaVDRDocQAGovReportRetrieval, JinaVDRDocQAHealthcareIndustryRetrieval, JinaVDRDocVQARetrieval, JinaVDRInfovqaRetrieval, JinaVDRTatQARetrieval, VDRMultilingualRetrieval, VidoreArxivQARetrieval, VidoreDocVQARetrieval, VidoreInfoVQARetrieval, VidoreSyntheticDocQAAIRetrieval, VidoreSyntheticDocQAEnergyRetrieval, VidoreSyntheticDocQAGovernmentReportsRetrieval, VidoreSyntheticDocQAHealthcareIndustryRetrieval, VidoreTatdqaRetrieval, VisRAG-Ret-Train-In-domain-data, VisRAG-Ret-Train-Synthetic-data, docmatix-ir, wiki-ss-nq


@rnyak
Copy link
Contributor Author

rnyak commented Jan 27, 2026

@KennethEnevoldsen hello. can you rerun the failing test? our Mteb MR was merged. thanks.

@Samoed
Copy link
Member

Samoed commented Jan 27, 2026

I tried to rerun, but new version should be released to run test. I'll try to use main repo instead

@Samoed Samoed enabled auto-merge (squash) January 27, 2026 18:24
@Samoed Samoed merged commit 5ef8a88 into embeddings-benchmark:main Jan 27, 2026
3 checks passed
@rnyak
Copy link
Contributor Author

rnyak commented Jan 29, 2026

@Samoed @KennethEnevoldsen thanks for merging our MR. I'd like to follow up about LB update. We do still not see our three models' results on the ViDoRe LB on public tasks. Could you let us know if an update is expected soon?

I also created this issue to request evaluation on the private tasks. fyi. thanks.

@Samoed
Copy link
Member

Samoed commented Jan 29, 2026

Yeah, I find that our docker not building every time. I'll fix this. I saw your request to run private models. I'll run tomorrow

@Samoed
Copy link
Member

Samoed commented Jan 29, 2026

Leaderboard updated image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants